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1.  SUMMARY  OF  ADDRESSED  TASKS  AND  ACCOMPLISHMENTS 

We  have  addressed  all  objectives  planned  in  the  proposal.  First,  we  proved  asymptotic  optimality 
of  the  Generalized  SLRT  and  the  Adaptive  SLRT  for  testing  multiple  composite  hypotheses  and 
very  general  non-iid  stochastic  models  as  the  probabilities  of  errors  become  small.  The  results  are 
indeed  very  general  and  include  Markov,  hidden  Markov,  state-space,  and  autoregression  models 
as  particular  cases.  Second,  we  developed  computationally  efficient  and  nearly  optimal  tests  for 
detecting  unstructured  and  structured  patterns  in  multi-stream  (sensor,  channel)  systems  assuming 
that  data  between  channels  are  mutually  independent  but  may  be  of  a  very  general  non-iid  struc¬ 
ture  in  channels,  and  that  the  number  of  affected  channels  is  unknown  and  may  vary  from  small  to 
large.  Third,  we  developed  a  general  Bayesian  theory  of  quickest  changepoint  detection  for  general 
non-iid  stochastic  models  assuming  a  certain  stability  of  the  log-likelihood  ratio  (LLR)  process  ex¬ 
pressed  via  the  r-complete  convergence  of  the  LLR  to  a  finite  and  positive  number  which  can  be 
regarded  as  the  Kullback-Leibler  information  number.  Fourth,  we  developed  a  similar  minimax 
change  detection  theory  modifying  and  relaxing  previous  results  of  Lai  (1998)  to  complete  conver¬ 
gence  of  the  LLR  and  considering  novel  classes  of  detection  procedures  that  confine  local  maximal 
conditional  probability  of  a  false  alarm. 

2.  MAIN  RESULTS 

2.1.  Asymptotic  Optimality  Properties  of  the  Multihypothesis  Sequential  Tests 

Consider  the  following  problem  of  testing  multiple  composite  hypotheses  associated  with  gen¬ 
eral  non-iid  stochastic  models.  Let  (f2,  Pe),  n  =  1,  2  . . . ,  be  a  filtered  probability  space 

with  standard  assumptions  about  monotonicity  of  the  cr-algebras  &n.  The  vector  parameter  6  = 
(6i, ... ,  9()  belongs  to  a  subset  0  of  (-dimensional  Euclidean  space.  The  sub-n-algebra  = 
=  cr(X”)  of  &  is  generated  by  the  stochastic  process  X"  =  (Xl,  . . . ,  Xn)  observed  up  to 
time  n.  The  hypotheses  to  be  tested  are  “H,  :  6  e  Q”,  i  =  0, 1, . . . ,  N  (N  ^  1),  where  0.,  are 
disjoint  subsets  of  0.  We  will  also  suppose  that  there  is  an  indifference  zone  Iin  €  0  in  which 
there  are  no  constraints  on  the  probabilities  of  errors  imposed.  The  indifference  zone,  where  any 
decision  is  acceptable,  is  usually  introduced  keeping  in  mind  that  the  correct  action  is  not  critical 
and  often  not  even  possible  when  the  hypotheses  are  too  close,  which  is  perhaps  the  case  in  most, 
if  not  all,  practical  applications.  However,  in  principle  Iin  may  be  an  empty  set.  The  probability 
measures  Pe  and  P0  are  assumed  to  be  locally  mutually  absolutely  continuous.  By  po(Xn |X”-1), 
n  ^  1  we  denote  corresponding  conditional  densities  which  may  depend  on  n. 

A  multihypothesis  sequential  test  5  =  (T,  d)  consists  of  the  pair  ( T,d ),  where  T  is  a  stop¬ 
ping  time  with  respect  to  the  filtration  {dFn}n^ o>  and  d  =  dr(Xf)  G  (0, 1, . . . ,  Nj  is  an  &T- 
measurable  (terminal)  decision  rule  specifying  which  hypothesis  is  to  be  accepted  once  observa¬ 
tions  have  stopped.  Specifically,  the  hypothesis  Hi  is  accepted  if  d  —  i  and  rejected  if  dj^i, 
i.e.,  {d  =  i}  =  {T  <  oo,  <5  accepts  Hi}.  The  quality  of  a  sequential  test  is  judged  on  the  basis  of 
its  error  probabilities  and  expected  sample  sizes  or  more  generally  on  the  moments  of  the  sample 
size.  Let  6)  =  P g(d  =  j),6  e  0,:  (i  ^  j,  i,  j  =  0, 1, . . . ,  N)  be  the  probability  of  accepting 
the  hypothesis  Hj  by  the  test  5  when  the  true  value  of  the  parameter  6  is  fixed  and  belongs  to  the 
subset  @j  and  let  9)  =  P  e(d  ^  i),  9  e  0,;  be  the  probability  of  rejecting  the  hypotheses  Hi 
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when  it  is  true.  Introduce  the  following  two  classes  of  tests 


c([«d)  =  {  5  ■  sup  aij(S,  6)  ^  aij  for  all  —  i^j>, 

6»e©i 

C(j3)  —  <{  5  :  sup  Pi(8,  6)  ^  fa  for  alii  =  0, 1, . . . ,  N  }>  , 

0e©i 


(1) 


for  which  maximal  error  probabilities  do  not  exceed  the  given  numbers  atJ  and  /3,. 

The  goal  is  to  find  tests  that  are  nearly  (asymptotically)  optimal  as  — *  0  and  3t  — >  0  in 

the  sense  of  minimizing  the  expected  sample  size  E gT  or  more  generally  higher  moments  of  the 
stopping  time  E eTm,  m  ^  1  for  all  parameter  values  6  6  0. 

In  the  IPR  for  the  grant  at  USC|Tartakovsky  (2013a),  we  designed  an  adaptive  matrix  sequential 
likelihood  ratio  test  (AMSLRT)  based  on  one- stage  delayed  estimators  of  the  unknown  parameters 
and  proved  its  asymptotic  optimality  assuming  the  strong  law  of  large  numbers  (SLLN)  for  the 
log-likelihood  ratio  (LLR)  processes.  The  advantage  of  this  adaptive  test  over  the  generalized 
sequential  likelihood  ratio  test  (GSLRT),  which  we  consider  below,  is  that  the  error  probabilities 
are  easily  controlled  (upper-bounded).  However,  obviously  the  AMSLRT  is  inferior  to  the  GSLRT 
since  there  is  loss  of  information  at  each  stage,  and  this  is  expected  to  influence  its  performance 
degradation  especially  in  the  vector  case  where  the  dimensionality  of  the  parameter  i  is  relatively 
large. 

Below  we  show  that  the  GSLRT  is  also  asymptotically  optimal. 

2.1.1.  The  Multihypothesis  Generalized  Sequential  Likelihood  Ratio  Test 


Define  the  generalized  LR  statistics 

suPee©  llfc=iPe(^lxi-1) 


A*  = 


Ecu  p*(**ixj 


k- In 


suP0ee,nLiP0(XfclXi  )  suPeee.nLiPeMXi 


fc-i\  > 


i  =  Q,l,...,N,  (2) 


where  0”n  =  arg  supWe0  /vAX,')  is  the  MLE  estimator.  The  Multihypothesis  Generalized  Sequen¬ 
tial  Likelihood  Ratio  Test  (MGSLRT)  is  of  the  form 


stop  at  the  first  n  ^  1  such  that  for  some  i  ^  AJt  for  all  j  ^  i 


(3) 


and  accept  the  (unique)  Hi  that  satisfies  these  inequalities,  where  Al}  are  positive  and  finite  num¬ 
bers  (thresholds). 

Note  that  the  MGSLRT  5  =  (T,  d)  given  by  ([3])  can  be  also  represented  as  follows: 


T  =  min  Ttl 


d  —  i  if  T  =  Ti, 


(4) 


where 


T  =  inf  <  n  ^  1  :  in  ^  max  [£3n  +  ar^\  >  ,  a,j  =  log  At]1  i  —  0, 1, . . . ,  N-,  (5) 

*  0<j<N  * 


^Tlogp^XfclX^1),  4  =  sup^logp^x.lxr1). 


k= 1 


k= 1 
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2.1.2.  Near  Optimality  of  the  GSLRT 

In  the  following,  we  will  write  <3:^(0)  =  0)  and  3(0)  =  3,35.  0)  for  the  probabilities  of 

errors  of  the  MGSLRT. 

The  developed  asymptotic  hypothesis  testing  theory  is  based  on  the  SLLN  and  rates  of  conver¬ 
gence  in  the  strong  law  for  the  LLR  processes,  specifically  by  strengthening  the  strong  law  into  the 
r-quick  version. 

Definition  1.  Let  P  be  a  probability  measure  and  E  the  corresponding  expectation.  For  r  >  0,  the 
random  variable  Yn  is  said  to  converge  P -r-quickly  to  a  constant  q  if  ELI  <  oo  for  all  £  >  0,  where 
Le  =  sup  {n  :  \Yn  —  q\  >  e}  (sup  0  =  0). 

Note  that  P (Le  <  oo)  =  1  for  all  e  >  0  is  equivalent  to  the  P-a.s.  convergence  of  Yn  to  q. 
Define  the  LLR  process 


A  n(0,0) 


,  dP« 

l0gdP^ 


J^tog 

k= 1 


PoMXp1) 

PstU-IXf-1) 


and  assume  that  there  exist  positive  and  finite  numbers  I (6,  0  )  such  that 

-A n(0, 0)  1(0, 0)  for  all  0, 0  G  0,  0  f  0.  (6) 

fl  n— >-oo 

In  addition,  we  certainly  need  some  conditions  on  the  behavior  of  the  MLE  0*  for  large  n,  which 
should  converge  to  the  true  value  6  in  a  proper  way.  To  this  end,  we  require  the  following  condition 
on  the  generalized  LR  process: 

-  log  An(0)  /(6>;  o)  for  all  0,  G  e  0,  0  ±  0,  (7) 

fl  n— >oo 

so  that  the  normalized  by  n  LLR  tuned  to  the  true  parameter  value  and  its  estimate  converge  to 
the  same  constants.  In  certain  cases,  but  not  always,  conditions  ([6])  and  <(7])  imply  the  following 
conditions 

-log  A*  j.(fl)  for  all  0  e  0  \  0*,  i  =  0,1,...,N,  (8) 

fl  n— kx> 

where  h(0)  =  infQge.  7(0,0)  (the  minimal  “distance”  from  0  to  the  set  ©j)  is  assumed  to  be 
positive  for  all  i.  Write  amax  =  maxjj  ai:]  and  /3max  =  max*  3i  and  define 


and 


where 


JAO)  =  min  [L(0)/cji]  for  0  G  ©*,  J(0)  =  max  JA6 )  for  0  G  Iin, 

o^j^N  O^i^N 

J* (0)  =  min  [lj(0)/cj\  for  0  G  @j, 

0<7<iV 

J*(0)  =  max  min  [/7(0)/c7]  =  max  J*(6 )  for  0  G  Iin, 
j¥=i 

Cij  =  lim  |  log  oiij | / 1  log  amax | ,  c7  =  lim  |  log  Al/I  l°g /3max | • 

max  ^0  Pmax — ^0 


(9) 


(10) 
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Theorem  [2]  below  establishes  uniform  asymptotic  optimality  of  the  MGSLRT  in  the  general 
non-iid  case  with  respect  to  moments  of  the  stopping  time  distribution.  The  proof  is  based  on  the 
technique  developed  by  [Tartakovsky | ([1998])  for  multiple  simple  hypotheses.  It  includes  a  two-step 
procedure:  first  we  obtain  the  asymptotic  lower  bounds  for  moments  of  the  stopping  time  distri¬ 
bution  inf(SeC([Q.j])  E e[T]m,  6  G  ©*,  m  >  0,  i  =  0,1,...,  A^,  and  then  we  show  that  these  lower 
bounds  are  attained  for  the  MGSLRT.  The  asymptotic  lower  bounds  are  given  in  the  following 
theorem. 

Theorem  1  (Asymptotic  Lower  Bounds).  Assume  that  there  are  positive  and  finite  numbers  1(6,6) 
such  that 

-A  n(6,6)  1(6,6)  for  all  6,6  €0,0  ^6.  (11) 

n  t — >oo 

Let  f(6)  =  infgpQ  1(6,  6)  and  suppose  min05^jv  h(6)  >  0.  Then,  for  all  6  G  0  and  0  <  e  <  1, 


inf  P 9  {T  >  £Ag([aij])}  ->■  1  as  amax  ->  0, 

<5eC([aij]) 

inf  Pe  {T  >  eAe({3)}  ->•  1  as  /3max  ->•  0, 

<5eC(/3) 


(12) 


and  therefore,  for  all  m  >  0  and  6  G  0, 


inf  EgTm  ^  [Ae([aij])]'n  (1  +  o(l))  as  amax  ->■  0, 

5eC([a^j) 

inf  E eTm  >  [Ae((3))m  (1  +  o(l))  as  /3max  -G  0, 

SeC(p) 


(13) 


where 


and 


Ae([(Xij]) 


log 

^max  | /  Ji(6)  for  6  G  0*  and  i  =  0, 1, . . . ,  N 
log 

^max  \/J(6)  for  0  G  Iin. 


Ae((3) 


log/3max|/J*(0)  for  6  G  0,  and  i  =  0,1,...,  iV 
log  /3max \/J*(6)  for  6  G  Iin. 


Next,  strengthening  the  SLLN  ([IT])  into  the  the  r-quick  version  it  can  be  shown  that  the  lower 
bounds  ([T3])  are  attained  by  the  MGSLRT  if  the  thresholds  are  selected  appropriately.  The  follow¬ 
ing  theorem  spells  out  details. 


Theorem  2  (MGSLRT  Asymptotic  Optimality).  Assume  that  r-quick  convergence  conditions  ([6]) 
and  ([8])  are  satisfied. 

(i)  If  the  thresholds  Ay  are  so  selected  that  supee0.  atJ  (6)  f  a y  and  log  Ay  ~  log(l /ay),  then 
for  m  ^  r  as  amax  — >■  0 


inf  E  0Tr 

^EC([o:ij]) 


E  e[T* 


logamax|  /  Jfd)]171  for  all  d^Q^andi 
log  «max|/ J (6)}m  for  all  6  G  Iin, 


0,1, 


N 


(14) 


where  the  functions  Jfd),  J(6)  are  defined  as  in  0. 
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(ii)  If  the  thresholds  Ar]  =  Aj,  are  so  selected  that  sup0g0  .  d,(0)  f  and  log  Ar  rs./  log(l /A). 
then  for  m  ^  r  as  /3max  — >  0 


inf  E eTm  ~  Eg\T*]m  ~ 

<5eC(/3) 


[|iog/Jm«|/J*(e)] 

[|log/Jm„|/J*(«)l 


for  all  6  G  @j  and  i 
for  all  6  e  Iin, 


(15) 


where  the  functions  J*(0),  J*(0)  are  defined  as  in 

Consequently,  the  MGSLRT  minimizes  asymptotically  the  moments  of  the  sample  size  up  to 
order  r  uniformly  for  all  6  G  0  in  the  classes  of  tests  Cffa^])  and  C  (j3). 

Remark  1.  One  of  the  most  important  issues  is  to  obtain  upper  bounds  and  approximations  for 
error  probabilities  of  the  MGSLRT.  However,  we  do  not  know  how  to  upper-bound  the  error  prob¬ 
abilities  of  the  MGSLRT.  The  reason  is  that  the  statistics  A)(  are  not  likelihood  ratios  anymore  so 
that  the  change-of-measure  argument  (Wald’s  likelihood  ration  identity)  cannot  be  applied.  Some 
asymptotic  approximations  still  can  be  obtained  in  the  iid  case  for  ^-dimensional  exponential  fam¬ 
ilies  using  large  and  moderate  deviations: 


sup  P o{d  =  j)  =  - f  0(1)  as  min  A^  — »  oo 

(H  (-),  Aji  ij 


(16) 


(cf.jChan  and  Lai  (2000);  Lorden  ( 1977 )).  In  the  general  non-iid  case  this  is  still  an  open  problem. 


Remark  2.  The  assertions  of  Theorem  [2]  remain  true  if  the  normalization  by  n  in  ([8])  is  replaced 
with  the  normalization  by  'fiiyi),  where  'ip(t)  is  an  increasing  function,  L(oc)  =  oo,  in  which  case 
[|  logamax| / Ji{0)]m  in  (ft?])  should  be  replaced  with  \D([|  log  amax|/ Jfid)]™),  where  'k  is  inverse 
to  and  similarly  in  (fl5|). 


2.2.  Detection  of  Structured  and  Unstructured  Patterns  in  Multiple  Data  Streams 


Rapid  signal  detection  in  multistream  data  or  multichannel  systems  is  widely  applicable.  For  ex¬ 
ample,  in  the  medical  sphere,  decision-makers  must  quickly  detect  an  epidemic  present  in  only 
a  fraction  of  hospitals  and  other  sources  of  data  [Chang]  (|2003|);  |Sonesson  and  Bock|(|2003|);|Tsui 


et  al.  (2012|).  In  environmental  monitoring  where  a  large  number  of  sensors  cover  a  given  area, 


decision-makers  seek  to  detect  an  anomalous  behavior,  such  as  the  presence  of  hazardous  mate¬ 
rials  or  intruders,  that  only  a  fraction  of  sensors  typically  capture  Fienberg  and  Shmueli  (2005); 


Rolka  et  al.  (2007).  In  military  defense  applications,  there  is  a  need  to  detect  an  unknown  num¬ 


ber  of  targets  in  noisy  observations  obtained  by  radars,  sonars  or  optical  sensors  that  are  typically 
multichannel  in  range,  velocity  and  space |Bakut  et  al.|(|l963[);[Tartakovsky  and  Brown] (|2008[).  In 
cyber  security,  there  is  a  need  to  rapidly  detect  and  localize  malicious  activity,  such  as  distributed 
denial-of-service  attacks,  typically  in  multiple  data  streams  |Szor|(|2005j);  |Tartakovsky|(j2014]);  [Tar 


takovsky  et  al.  (]2006a|b[).  In  genomic  applications,  there  is  a  need  to  determine  intervals  of  copy 


number  variations,  which  are  short  and  sparse,  in  multiple  DNA  sequences  Siegmund  (2013|). 

Motivated  by  these  and  other  applications,  we  consider  a  general  sequential  detection  problem 
where  observations  are  acquired  sequentially  in  a  number  of  data  streams.  The  goal  is  to  quickly 
detect  the  presence  of  a  signal  while  controlling  the  probabilities  of  false  alarms  (type-I  error) 
and  missed  detection  (type-II  error)  below  user-specified  levels.  Two  scenarios  are  of  particular 
interest  for  applications.  The  first  is  when  a  single  signal  with  an  unknown  location  is  distributed 
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over  a  relatively  small  number  of  channels.  For  example,  this  may  be  the  case  when  detecting 
an  extended  target  with  an  unknown  location  in  a  sequence  of  images  produced  by  a  very  high- 
resolution  sensor.  We  call  this  the  “structured”  case,  since  there  is  a  certain  geometrical  structure 
we  can  know  at  least  approximately.  A  different,  completely  “unstructured”  scenario  is  when  an 
unknown  number  of  “point”  signals  affect  the  channels.  For  example,  in  many  target  detection 
applications,  an  unknown  number  of  point  targets  appear  in  different  channels  (or  data  streams), 
and  it  is  unknown  in  which  channels  the  signals  will  appear  Tartakovsky| (2013c ).  The  multistream 
sequential  detection  problem  is  well- studied  only  in  the  case  of  a  single  point  signal  present  in 
one  (unknown)  data  stream  [Thrtakovsky  et  al.|  (|2003a).  However,  as  mentioned  above,  in  many 
applications,  a  signal  (or  signals)  can  affect  multiple  data  streams  (e.g.,  when  detecting  an  unknown 
number  of  targets  in  multichannel  sensor  systems).  In  fact,  the  affected  subset  could  be  completely 
unknown  (unknown  number  of  signals),  or  known  partially  (e.g.,  knowing  its  size  or  an  upper 
bound  on  its  size  such  as  a  known  maximal  number  of  signals  that  can  appear). 

Our  goal  is  to  develop  a  general  asymptotic  optimality  theory  without  assuming  iid  obsen’a- 
tions  in  the  channels.  Assuming  a  very  general  non-iid  model,  we  focus  on  two  multichannel 
sequential  tests,  the  Generalized  Sequential  Likelihood  Ratio  Test  (G-SLRT)  and  the  Mixture  Se¬ 
quential  Likelihood  Ratio  Test  (M-SLRT),  which  are  based  on  the  maximum  and  average  likeli¬ 
hood  ratio  over  all  possibly  affected  subsets  respectively.  We  impose  minimal  conditions  on  the 
structure  of  the  observations  in  channels,  postulating  only  a  certain  asymptotic  stability  of  the  cor¬ 
responding  log-likelihood  ratio  statistics.  Specifically,  we  assume  that  the  suitably  normalized  log- 
likelihood  ratios  in  channels  almost  surely  converge  to  positive  and  finite  numbers,  which  can  be 
viewed  as  local  limiting  Kullback-Leibler  information  numbers.  We  additionally  show  that  if  the 
local  log-likelihood  ratios  also  have  independent  increments,  both  the  G-SLRT  and  the  M-SLRT 
minimize  asymptotically  not  only  the  expected  sample  size  but  also  every  moment  of  the  sample 
size  distribution  as  the  probabilities  of  errors  vanish.  Thus,  we  extend  a  result  previously  shown 


only  in  the  case  of  i.i.d.  observations  and  in  the  special  case  of  a  single  affected  stream  Tartakovsky 


et  al.|(f2003a).  In  the  general  case  where  the  local  log-likelihood  ratios  do  not  have  independent 


increments,  we  require  a  certain  rate  of  convergence  in  the  Strong  Law  of  Large  Numbers,  which 
is  expressed  in  the  form  of  r-complete  convergence  (cf.  (Tartakovsky  et  al.j  2014b[  Ch  2)).  Under 
this  condition,  we  prove  that  both  the  G-SLRT  and  the  M-SLRT  asymptotically  minimize  the  first 
r  moments  of  the  sample  size  distribution.  The  r-complete  convergence  condition  is  a  relaxation 
of  the  r-quick  convergence  condition  used  in  |Tartakovsky  et  ak  (2003a)  (in  the  special  case  of 
detecting  a  single  signal  in  a  multichannel  system).  However,  its  main  advantage  is  that  it  is  much 
easier  to  verify  in  practice.  Finally,  we  show  that  both  the  G-SLRT  and  the  M-SLRT  are  computa¬ 
tionally  feasible,  even  with  a  large  number  of  channels,  when  we  have  an  upper  and  a  lower  bound 
on  the  number  of  signals,  a  general  set-up  that  includes  cases  of  complete  ignorance  as  well  as 
cases  where  the  size  of  the  affected  subset  is  known. 

Suppose  that  observations  are  sequentially  acquired  over  time  in  N  distinct  sources  (data 
streams,  channels,  sensors).  We  denote  the  observations  in  the  kth  data  stream  as  Xk  :=  { Xk } ng. i , 
k  =  1, . . . ,  N.  For  every  k,  we  assume  that  either  Pk  =  Pq  or  P/:'  =  P where  Pfc  is  the  “true” 
distribution  of  Xk  and  P,'  and  Pq  are  two  locally  equivalent  probability  measures  on  the  canonical 
space  of  Xk,  i.e.,  P^'  <<  Pq  and  Po  «  P?  when  both  probability  measures  are  restricted  to 
=  cr(Xk;  0  ^  s  <  n)  for  some  n  ^  0.  We  denote  by  Ak  the  Radon-Nikodym  derivative 
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(likelihood  ratio)  of  P|'  versus  Pq  given  and  by  Zk  the  corresponding  LLR,  i.e., 


dPk 

Afc  =  — f 

n  dPg 


and  Zk  =  log  A, 


One  possible  and  useful  interpretation  is  that  there  is  “noise”  in  source  k  under  Pq  and  “signal” 
and  noise  otherwise  (object/target  appearance  in  noise).  Alternatively,  one  may  think  about  Pq  as  a 
probability  measure  corresponding  to  a  “normal”  scenario,  while  Pf  corresponds  to  an  “abnormal” 
scenario  when  the  k- th  data  stream  is  affected  by  some  event  (malicious/unusual  activity  /behavior 
in  social  networks,  bio-chemical  threat  appearance,  attacks  in  computer  networks,  etc.).  We  want 
to  test  the  global  null  hypothesis  H0  :  P/,:  =  Pq,  1  <  k  <  N,  according  to  which  there  is  only  noise 
in  all  data  streams,  against  the  alternative  that  a  signal  is  present  in  a  subset  of  data  streams  that 
belongs  to  a  class  V.  Thus,  the  alternative  hypothesis  takes  the  form  :=  U^c-pH'1,  where  the 
distribution  of  Xk  under  hb4  is 


p  k 


Pq  when  k  ^  A 
P^  when  k  e  A 


Assuming  that  the  observations  from  different  data  streams  are  mutually  independent,  which  will 
be  our  standing  assumption  from  now  on,  the  distribution  of  X  =  (AT1, . . . ,  XK)  under  H0  is 
described  by  the  product  measure  P0  =  Pq  x  . . .  x  Pq  .  On  the  other  hand,  the  distribution  of  X 
when  signal  is  present  in  subset  A  takes  the  form 


p-4  =  yi  pf  x  pl0. 

k&A  k^A 

Equivalently,  for  any  given  n  and  subset  A  €  V,  wc  have: 


Af 


UZ 

keA 


The  goal  is  to  find  a  pair  5  =  (T,  d )  that  consists  of  an  {^j-stopping  time  T  and  an  JPT- 
measurable  random  variable  d  taking  values  in  (0, 1},  so  that  H,  is  selected  on  {d  —  i,T  <  oo}, 
i  —  0, 1,  where  {■An}  is  the  filtration  generated  by  all  sources  of  observations,  i.e., 

&n=  V  &k  =  a(Xk;  0^s^nA<k<N). 


Specifically,  the  goal  is  to  find  a  sequential  test  that  (a)  controls  type-I  and  type-II  error  probabilities 
below  a  and  /3,  respectively,  i.e.,  belongs  to  the  class  of  tests 

:  Po (d  —  1)  <  a.  and  sup  P A(d  —  0)  <  /?}, 

Aer 


and  (b)  it  is  asymptotically  optimal  as  a,  f3  — >  0  in  the  sense  that  it  attains 

inf  E0T  and  inf  E^T  V  A  e  V. 

(T,d)eca,p(v)  se  caiP{v) 
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More  generally,  we  are  interested  in  establishing  conditions  under  which  a  specific  sequential  test 
50  =  (To,  do)  is  first-order  asymptotically  optimal  with  respect  to  higher  moments  of  the  stopping 
time  distribution,  i.e.,  for  all  0  <  m  ^  r  and  some  r  >  1 


lim 

a,/3— >-0 


inf  E  0Tm 

Sg Ca,p{V) 

W 


1  and 


lim 

q,/3->0 


inf  E  ATm 

seca^(r) 


E  AT™ 


V  A  e  V. 


Of  course,  the  answer  to  this  question  depends  heavily  on  the  class  of  alternatives  V.  We  will 
only  assume  that  there  is  a  lower  bound  (m  ^  1)  and  an  upper  bound  (m  ^  N)  on  the  cardinality 
of  the  subset  of  affected  data  streams,  i.e., 


V  =  {A  :  m  ^  \A\  ^  m}. 


(17) 


This  sequential  testing  problem  is  well  understood  when  the  signal  can  be  present  in  at  most 
one  data  stream  (m  =  1).  Specifically,  in  this  case,  the  optimality  of  the  GSLRT  was  established 
by  Tartakovsky  et  al.  ([2003b )  under  general  conditions  on  the  underlying  distributions. 

In  this  project,  we  propose  the  GSLRT  and  the  Weighted  SLRT  (WSLRT)  that  are  feasible 
for  a  large  number  of  data  streams  on  one  hand  and  asymptotically  optimal  on  the  other  hand.  In 
addition,  error  probabilities  of  these  tests  can  be  explicitly  controlled. 

2.2.1.  Asymptotic  Optimality  of  the  G-SLRT 


We  begin  with  establishing  lower  bounds  for  moments  of  the  stopping  time  distribution.  Recall  that 
we  consider  very  general  non-iid  models  for  the  observations  in  “channels,”  so  the  LLR 

processes  Z}fi  k  =  1, ....  A"  have  no  particular  structure.  However,  to  obtain  some  meaningful 
results  certain  assumptions  have  to  be  made.  We  formulate  these  assumptions  in  the  form  of  a 
certain  stability  of  the  behavior  of  the  LLRs  for  large  n.  Specifically,  in  the  following  we  suppose 
that  there  are  positive  and  finite  numbers  Iq  and  /(  such  that  the  normalized  LLRs  n~1Z k  = 
1, . . . ,  N  converge  in  probability  to  —Iq  under  Pg  and  to  /(''  under 


I  pfc  1  p  k 

ryk  r0  v  jk  ryk  rl  v  jk 

^ n  ~  ^ 

Tl  t— >  oo  77,  t— >oo 


k  =  l,...,N, 


(18) 


in  which  case  also 


where 


^  r?A  Po  jA  ^  ryA  ^1  ,  tA 
^  n  ~  ^  10  )  ^ n  ~  11  ) 

n  t— >  oo  n  t— >o o 


and  I?='EI 


k 

1  • 


(19) 


fce.4 


keA 


The  following  theorem  establishes  asymptotic  lower  bounds  for  all  positive  moments  of  the 
stopping  time  distribution  in  the  class  We  write  amax  =  max  (a,  6). 

Theorem  3.  Assume  there  exist  positive  and  finite  numbers  Iq  and  /(  such  that,  for  all  e  >  0  and 
k  =  1,. . . ,  N, 

1 


M—¥  oo 


iim  P1  {  77  zn  >  (1  +  e)h  >  =  1, 


M  l^n^M 


M— >oo 


lim  P‘  -  .max  (~Zkn)  >  (1  +  a)/„  =  1. 


M  l^n^M 


(20) 
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Then,  for  all  m  >  0, 


lim  inf 

Gmax- ^0 


lim  inf 

Gmax  ^0 


inf  E  0TT 

SeCa,f,(V) 

I  log/3|m 

inf 

seca,f,(r) 

|  loga|m 


mmAeP  I f 


(21) 


if 


When  V  =  {A},  i.e.,  there  is  no  uncertainty  regarding  the  subset  of  streams  in  which  the  signal 
may  be  present,  the  asymptotic  lower  bounds  ([21])  are  attained  by  the  Sequential  Probability  Ratio 
Test  (SPRT), 


r£b  =  inf(ra  (-«,  6)},  dA  = 


A 


-/A 


1  when  Zaa  >  h 
0  when  Z\  <  —a  ’ 


(22) 


under  r-quick  convergence  conditions  for  the  LLRs,  which  can  be  deduced  from  Lai  (j  198 1  [);  fTar- 


takovsky  (1998);  Tartakovsky  et  al.  (2014a).  To  be  specific,  for  £  >  0,  introduce  the  last  entree 


times 


Lq(£)  =  sup  [n  ^  1  :  | n  XZ %  +  ifi |  >  ej  and  L^(s)  =  sup  [n  ^  1  :  | n  1Z *  —  if |  >  ej 
(sup  {0}  =  0)  and  assume  that  for  some  r  >  0, 


E l{L«(£)}r  <  oo  and  E *[Lj(e)]r  <  oo,  k  =  l,...,N. 


(23) 


According  to  Definition [lj  conditions  (]23|)  mean  that  the  normalized  LLRs  n~ 1  Zf  k  =  1, ...  ,N 
converge  to  —Iq  and  /f  r-quickly  under  Pq  and  P,',  respectively. 

Obviously,  conditions  (|23j)  imply  the  corresponding  r-quick  convergence  of  n  ~lZA: 


E0[L)f(£)]r  <  oo  and  E A[LA{e)\r  <  oo, 


Ait  A/ 


(24) 


where  L^(e)  =  sup  {n  ^  1  :  |  n  lZA  +  l£\  >  s}  and  LA(e)  =  sup  {t  ^  1  :  \n  1ZA  —  IA\  >  £  j. 

If  the  thresholds  b  and  a  are  selected  so  that  ( ta ,  dA)  G  Ca,p(A)  and  b  ~  |  loga|,  a  rv_/  I  log/3|, 
in  particular  b  =  |  log  o  and  a  =  |  log /3 1 ,  then  using  (Tartakovsky  et  al. ,  2014a,  Theorem  3.4.2) 
yields,  for  all  0  <  m  ^  r  as  om,lx  — >•  0, 


inf  E0[r] 

seca,p(A) 

inf  E  a[t] 

seCa,fi(A) 


En  [r4] 


m  ^  FA\TAVn  ~ 


iogmm 
V  J  ’ 

I  log  cv| 


(25) 


When  V  is  not  a  singleton,  it  is  natural  to  apply  a  generalized  likelihood  ratio  approach  and 
consider  the  G-SLRT  Sa)b  =  (Tajb,  d)  given  by 


i 


Ta  b  =  inf  <j  n  ^  1  :  max  ZA  g  (-a,  b )  \  , 


d  = 


when  max  Zf  >  b 
AdV 

0  when  max  ZA  <  —a 
A&V 


(26) 
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This  test  was  considered  by  Tartakovsky  et  al.  (2003b)  where  its  asymptotic  optimality  was  estab¬ 
lished  in  the  special  case  that  signal  can  be  present  in  only  a  single  data  stream,  i.e.,  V  =  {*4  : 
\A\  =  1}.  Theorem  [4] below  is  a  generalization  of  this  result  for  an  arbitrary  class  of  alternatives 
V. 


The  following  lemma  gives  upper  bounds  on  the  error  probabilities  of  the  G-SLRT,  which 
suggest  threshold  values  that  guarantee  the  target  error  probabilities.  This  lemma  does  not  require 
any  assumptions  on  the  local  distributions.  Let  \V\  =  C'/y  denote  the  cardinality  of  class  V,  i.e., 
the  number  of  possible  alternatives  in  V.  Note  that  \P\  takes  its  maximum  value  when  there  is  no 
prior  information  regarding  the  subset  of  affected  channels  ( Vn ),  in  which  case  \P\  —  2N  —  1. 


Lemma  1.  For  any  thresholds  a,b>  0, 

Po(<i  =  1)  ^  \P\  e~b  and  maxP"4^  =  0)  ^  e~a. 


(27) 


Therefore,  for  any  target  error  probabdities  a,  0  G  (0, 1),  we  can  guarantee  that  (f ,  d)  e  Cap('P) 
when  thresholds  are  selected  as 


b  —  |  \og{a/\P\)\  and  a  =  |  log  /3 1 . 


(28) 


Theorem  4.  Let  the  thresholds  b  and  a  in  the  GSLRT  (26 )  be  chosen  so  that  <)„},  e  C(lp('P)  and 
b  ~  |  loga|,  a  ~  |  log  /3|  as  ctmax  — >  0,  in  particular  b  =  \  \oga/\V\  \  and  a  =  \  log  /3 1 .  If  for  some 
r  >  0,  the  conditions  (1231)  hold,  i.e., 


1  ryk  P  'l-r-quickly  k  1  k  P  k0-r-quickly  k 


-ZKn^ - - — 4/f  and  -ZKn 

n  oo  n 


t—>  OO 


I  -If  k  =  1, 


,N, 


then,  for  any  class  of  alternatives  V  and  all  0  <  rri  f  r  as  amax  — >•  0, 

log /9| 


EnTr 


and  for  every  A  G  V, 


E-^T 


min  if 

.Aep 


log  a  | 


If 


inf  E0Tr 

5£Cail3(V) 


inf  E  ATm. 

SeCa:P(V) 


(29) 


(30) 


(31) 


Definition  2.  Let  r  >  0.  We  say  that  the  sequence  (Yn)n^1  converges  r-completely  under  proba¬ 
bility  measure  P  to  a  constant  q  as  n  — >■  oo  and  write 


Yn 


P —r— completely 


>  q 


if 


nr  XP  (\Yn  —  q\  >  e)  <  oo  for  all  e  >  0. 


n= 1 
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This  condition  turns  out  to  be  weaker  than  the  corresponding  r-quick  convergence  (in  general), 
and  more  importantly  it  is  easier  to  check  the  complete  convergence  condition  than  r-quick  con¬ 
dition.  Therefore,  as  a  next  step,  it  is  natural  to  replace  conditions  ([29])  with  the  corresponding 
r-complete  convergence  conditions  for  the  LLRs: 


1  k  P  f  —r— completely 

- m  - 


>  n  and  -Z, 

n  n-> oc  n  ‘ 

i.e.,  that  for  all  £  >  0  and  all  k  —  1, . . . ,  N, 


1  k  Pq  —r— completely 


>  —Tk 

*  10  ’ 


k  =  1,  •  ■  • ,  K, 


n= 1 


Zk  jk 
n  1l 

n 


>  e)  <  oo,  ^2 


n 


r—  1  r>k 


n= 1 


+1‘ 


>  £  ]  <  OO. 


(32) 


(33) 


The  following  theorem  spells  out  details. 

Theorem  5.  Let  the  thresholds  h  and  a  in  the  GSLRT  ([26])  be  chosen  so  that  daj,  G  Catp(V)  and 
b  ~  |  log  ol  |,  a  ~  |  log  /?|  as  cnmax  — >  0,  in  particular  b  =  \  \oga/[P\  \  and  a  =  |  log  /3|.  If,  for  some 
r  >  0,  the  r-complete  convergence  conditions  ([32])  hold,  then,  for  any  class  of  alternatives  V  and 
all  0  <  m  ^  r  as  amax  — >  0, 


and  for  every  A  G  V, 


En  T 


I  logj^l  \ 

min  if  I 

A&V  u  / 


m 


~  inf  E0Tm, 

<5e  CaiP{V) 


^Arj-rm 


( I  log  a 

V  h 4 


m 

~  inf  E  ATm 

SeCaiP(V) 


(34) 


(35) 


We  now  consider  a  special  case  where  the  LLR  increments  ikn  =  Zk  —  Zk_ , ,  n  ^  1  in  the  kth 
channel  are  independent,  but  not  necessarily  identically  distributed ,  random  variables,  and  show 
that  the  asymptotic  optimality  properties  ([34])— ([35])  hold  true  for  any  positive  integer  m,  as  long  as 
only  the  SLLN  holds,  i.e.,  as  long  as  the  almost  sure  convergence  conditions 


lzk  P?-a.s. 

Tl  n  n— l-oo 


»/f  and  -Zkn  Pg  -/*,  *-l . K, 

fl  n— »oo 


(36) 


are  satisfied.  To  this  end,  we  need  the  following  renewal  theorem. 

Lemma  2.  Let  :=  i>  1  <  k  <  N  be  ( possibly  dependent )  sequences  of  random  vari¬ 

ables  on  some  probability  space  (9,  -A .  P)  and  let  E  be  the  corresponding  expectation.  Define  the 
stopping  time 

u(b)  :=  inf  It  ^  1  :  mrn^  Sk  >  ftj  ;  Stk  :=  & 

Suppose  that  for  every  1  <  k  <  N  there  is  a  positive  constant  p,k  such  that  Sk/t  — ,  pk.  Then,  as 
b  —y  oo  we  have 

v{b)  a.s.  f  . 

— - >  nun  pk 

b  \l<k<N 

Moreover,  the  convergence  holds  in  O  for  every  r  >  0,  if  each  is  a  sequence  of  independent 
random  variables  and  there  is  a  A  G  (0, 1)  such  that 

sup  E  [exp{A(£tfc)_}l  <  oo.  (37) 
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The  following  theorem  establishes  a  stronger  asymptotic  optimality  property  for  the  G-SLRT 
in  the  case  of  LLRs  with  independent  increments. 


Theorem  6.  Let  V  be  an  arbitrary  class  of  possibly  affected  subsets  of  channels  and  suppose  that 
the  thresholds  in  the  G-SLRT  are  selected  so  that  Saj,  e  Ca^(V)  and  b  ~  |  log  a\,  a  ~  I  log  £  | 
as  ft,„„  — y  0,  in  particular  b  =  |  log  o/\'P\\  and  a  =  |  log  /3|.  If  the  LLR  increments, 
are  independent  over  time  under  Pq  and  P ,  for  every  1  <  k  <  N,  then  the  asymptotic  optimality 
properties  (f34j)-([35j)  hold  true  for  any  m  f  1,  as  long  as  the  almost  sure  convergence  conditions 
d36l)  hold. 


2.2.2.  Asymptotic  Optimality  of  the  M-SLRT 

In  this  section,  we  propose  an  alternative  sequential  test  that  is  based  on  averaging,  instead  of 
maximizing,  the  likelihood  ratios  that  correspond  to  the  different  hypotheses.  We  show  that  it  has 
the  same  asymptotic  optimality  properties  and  similar  feasibility  as  the  G-SLRT. 

Let  V  be  an  arbitrary  class,  {pa}aap  an  arbitrary  family  of  positive  numbers  that  add  up  to  1 
(weights)  and  consider  the  probability  measure 


P:=J2papa. 

A&V 


(38) 


Then  the  Radon-Nikodym  derivative  of  P  versus  Pq  given  -TTn  is 


An  :  = 


dP 


N 


dPr 


&TX 


Y  PA^n  =  Y  Y  PaA n  ■ 


(39) 


A&V 


n=  1  A&VrVn 


If  we  replace  the  GLRo  statistic  Zn  =  max^c-p  ZA  in  (|26|)  by  the  logarithm  of  the  mixture  likeli¬ 
hood  ratio,  Zn  :=  log  An,  then  we  obtain  the  following  sequential  test: 


t  =  inf  {n  ^  1  :  Zn  f  (—a,  6)}  ,  d 


1  when  ZT  >b 
0  when  ZY  <  —a  ’ 


(40) 


to  which  we  refer  as  the  Mixture  Sequential  Likelihood  Ratio  Test  (M-SLRT). 

In  the  following  lemma  we  show  how  to  select  the  thresholds  in  order  to  guarantee  the  desired 
error  control  for  the  M-SLRT. 


Lemma  3.  For  any  positive  thresholds  a  and  b  we  have 


P n(d  =  1)  <  e  and  max 

A&v 


P A(d  —  0)  <  (  minpA  I  e 


(41) 


Therefore,  for  any  a,  3  €  (0, 1),  (r,  d)  G  C  a,p(V)  when  the  thresholds  are  selected  as  follows: 


6  =  |  log  ce  |  and  a  =  |  log  /3 1  —  minQogp^). 


(42) 


The  following  theorem  shows  that  the  M-SLRT  has  exactly  the  same  asymptotic  optimality 
properties  as  the  G-SLRT. 
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Theorem  7.  Consider  an  arbitrary  class  of  possibly  affected  subsets,  V,  and  suppose  that  the 
thresholds  of  the  M-SLRT  are  selected  so  that  5a,b  G  Cap(V)  and  b  ~  |  loga|,  a  ~  |  log  /3 1  as 
Q:niax  — >  0,  in  particular  according  to  (|42|).  If  r -complete  convergence  conditions  (|32|)  hold,  then 
for  all  1  <  m  ^  r  we  have  as  amax  ->  0: 


E0[rm]  ~  | 

1  log/3| 

min  if 
\AeP  u 

E A[Tm]  ~  ( 

/  log  a;  P 

V  If  y 

inf  E  0[rm], 

(r,d)€  CaiP(V) 


inf  E^rrmi  far  every  A  EV. 

(T,d)€Ca,p(V) 


(43) 

(44) 


Moreover,  if  the  LLRs  Zf  have  independent  increments,  then  the  asymptotic  relationships  (|43])-(|44|) 
hold  for  every  m  >  0  as  long  as  the  almost  sure  convergence  conditions  ([36])  are  satisfied. 


2.2.3.  Feasibility 


The  implementation  of  the  G-SLRT  requires  computing  at  each  time  t  the  generalized  log-likelihood 
ratio  statistic 


Zn  =  max  Zf  =  max  Z  . 

AeP  n  AeV  ^  n 
k&A 

A  direct  computation  of  each  Z^  for  every  A  G  V  can  be  a  very  computationally  expensive  task 
when  the  cardinality  of  class  V,  \V\,  is  very  large.  However,  the  computation  of  Zr  is  very  easy 
for  a  class  V  of  the  form  which  contains  all  subsets  of  size  at  least  m  and  at  most  m.  In 

order  to  see  this,  let  us  use  the  following  notation  for  the  order  statistics:  Z^P  >  ...  >  Z[N\  i.e., 
Zu  ]  is  the  top  local  LLR  statistic  and  zfN)  is  the  smallest  LLR  at  time  n. 

When  the  size  of  the  affected  subset  is  known  in  advance,  i.e.,  m  —  m  —  m,  we  have 


m 

Zn  =  J2Zn]-  (45) 

fe=l 

Indeed,  for  any  A  G  Vm  we  have  Zp  <  YPk=i  ZrP  ■  Therefore,  Zn  <  YPk=i  ■  and  the  upper 
bound  is  attained  by  the  subset  which  consists  of  the  m  channels  with  the  highest  LLR  values  at 
time  n. 

In  the  more  general  case  that  m  <  m  we  have 


rn  m 

Zn  =  '£,Zin)+  E  (ZnZ’ 

k=  1  k=m-\- 1 

and  the  G-SLRT  takes  the  following  form: 

{rn  rn  \ 

n  >  1  :  ( Znk))+  or  E  Z^  -  ~a  \ 

k= 1  k=\  J 

sfl  when  Eti(4k))+>b 

\  0  when  YPk=\  zik)  <  ~a 


(46) 
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Indeed,  for  any  A  G  Vmm  we  have 

rn  m 

Zn<J2Zn)+  E 

k= 1  k=m+ 1 

and  the  upper  bound  is  attained  by  the  subset  which  consists  of  the  m  channels  with  the  top  m 
LLRs  and  the  next  (if  any)  top  m  —  m  channels  that  have  positive  LLRs. 

Similarly  to  the  G-SLRT,  the  M-SLRT  is  computationally  feasible  even  when  N  is  large.  In¬ 
deed,  the  mixture  likelihood  ratio  takes  the  form 

N 

a„ = cm  e  e  n  • 

m= l A&VnVm  keA 

When  in  particular  there  is  an  upper  and  a  lower  bound  on  the  size  of  the  affected  subset,  i.e., 
V  =  Pm  m  for  some  1  <  m  <  rn  <  N,  the  mixture  likelihood  ratio  statistic  takes  the  form 


aw = c(v)  e  e  n  (»a‘)  <47> 

m=m  AeVm  k&A 

and  its  computational  complexity  is  polynomial  in  the  number  of  channels,  N.  However,  in  the 
special  case  of  complete  uncertainty  (m  —  l,m  —  N ),  the  M-SLRT  requires  only  O(N)  opera¬ 
tions.  Indeed,  if  we  set  for  simplicity  pk  =  p  and  7r  =  p/(l  +  p),  then  the  mixture  likelihood  ratio 
in  (|47|)  admits  the  following  representation  for  the  class  V  =  Vn'- 

\n  =  C{V)  [(1  -  ir)~NAn  -  1]  (48) 

where  the  statistic  An  is  defined  as  follows: 

N 

An  =  H(l-n  +  nAkn).  (49) 

k= 1 


Note  that  the  statistic  An  has  an  appealing  statistical  interpretation,  as  it  is  the  likelihood  ratio 
that  corresponds  to  the  case  that  each  channel  belongs  to  the  affected  subset  with  probability  7r  G 
(0, 1).  It  is  possible  to  use  An  as  the  detection  statistic  and  incorporate  prior  information  by  an 
appropriate  selection  of  7r.  For  instance,  if  we  know  the  exact  size  of  the  affected  subset,  say 
V  =  Vm,  we  may  set  7r  =  m/N,  whereas  if  we  know  that  at  most  m  channels  may  be  affected, 
i.e.,  V  =  Vm,  then  we  may  set  i r  =  m/(2N). 

2.3.  Asymptotic  Bayesian  Theory  of  Quickest  Changepoint  Detection 


The  problem  of  rapid  detection  of  abrupt  changes  in  a  state  of  a  process  or  a  system  arises  in 


a  variety  of  applications  from  engineering  problems  (e.g.,  navigation  integrity  monitoring  Bas- 


seville  and  Nikiforov]  (|1993|);  Tartakovsky  et  al.|(|2014b|)),  military  applications  (e.g.,  target  detec¬ 


tion  and  tracking  in  heavy  clutter|TSrtakovsky  et  al.|p014b|)  to  cyber  security  (e.g.,  quick  detection 
of  attacks  in  computer  networks  Kent  (2000);  Tartakovsky|  (|2013bl);  |Tartakovsky  et  ak|  (|2006a|bl 
2014b|)).  In  the  present  project,  we  are  interested  in  a  sequential  setting  assuming  that  as  long  as 
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the  behavior  of  the  observation  process  is  consistent  with  a  “normal”  (initial  in-control)  state,  we 
allow  the  process  to  continue.  If  the  state  changes,  then  we  need  to  detect  this  event  as  rapidly  as 
possible  while  controlling  for  the  risk  of  false  alarms.  In  other  words,  we  are  interested  in  design¬ 
ing  the  quickest  change-point  detection  procedure  that  optimizes  the  tradeoff  between  a  measure 
of  detection  delay  and  a  measure  of  the  frequency  of  false  alarms. 

In  the  beginning  of  the  1960s,  [Shiryaev|  (|T963j)  developed  a  Bayesian  sequential  changepoint 
detection  (quickest  disorder  detection)  theory  in  the  iid  case  assuming  that  the  observations  are 
independent  and  identically  distributed  (iid)  according  to  a  distribution  F  pre-change  and  another 
distribution  G  post-change  and  with  the  prior  distribution  of  the  change  point  being  geometric.  In 
particular,  Shiryaev  ( 1963j)  proved  that  the  detection  procedure  that  is  based  on  thresholding  the 
posterior  probability  of  the  change  being  active  before  the  current  time  is  strictly  optimal,  mini¬ 
mizing  the  average  delay  to  detection  in  the  class  of  procedures  with  a  given  probability  of  false 
alarm.  Tartakovsky  and  Veeravallij  (2005 )  generalized  Shiryaev’s  theory  for  the  non-iid  case  that 
covers  very  general  discrete-time  non-iid  stochastic  models  and  a  wide  class  of  prior  distributions 
that  include  distributions  with  both  exponential  tails  and  heavy  tails.  In  particular,  it  was  proved 
that  the  Shiryaev  detection  procedure  is  asymptotically  optimal  -  it  minimizes  the  average  delay 
to  detection  as  well  as  higher  moments  of  the  detection  delay  as  the  probability  of  a  false  alarm 
vanishes.  Baron  and  Tartakovsky;  (2006)  developed  an  asymptotic  Bayesian  theory  for  general 
continuos-time  stochastic  processes. 

The  key  assumption  in  general  asymptotic  theories  developed  in  Baron  and  Tartakovsky  (2006); 


Tartakovsky  and  Veeravalli  (j2005 )  is  a  certain  stability  property  of  the  log-likelihood  ratio  process 


between  the  “change”  and  “no-change”  hypotheses,  which  was  expressed  in  the  form  of  the  strong 
law  of  large  numbers  with  a  positive  and  finite  number  and  its  strengthened  r-quick  version.  How¬ 
ever,  it  is  not  easy  (and  in  fact  can  be  quite  difficult)  to  verify  r-quick  convergence  in  particular 
applications  and  examples.  For  this  reason,  it  was  conjectured  inBaron  and  Tartakovsky  (2006); 


Tartakovsky  and  Veeravalli  (2005)  that  essentially  the  same  asymptotic  results  may  be  obtained 


under  a  weaker  r-complete  version  of  the  strong  law  of  large  numbers  for  the  log-likelihood  ratio. 


In  fact,  in  most  examples  provided  in  Baron  and  Tartakovsky  (2006);  Tartakovsky  and  Veeravalli 


(|2005 )  and  in  the  recent  book  by|Tartakovsky  et  ak  (2014bj),  verification  of  the  r-quick  convergence 
is  replaced  by  verification  of  the  r-complete  convergence.  Our  main  goal  in  this  project  is  to  con¬ 
firm  this  conjecture,  proving  that  the  Shiryaev  changepoint  detection  procedure  is  asymptotically 
optimal  under  the  r-complete  convergence  condition  for  the  suitably  normalized  log-likelihood 
ratio  process. 

In  the  following,  we  deal  only  with  discrete  time  £  =  n  €  Z+  =  { 0,1,2,...  }.  The  continuous 
time  case  t  G  M+  =  [0,  oo)  is  more  “delicate”  and  will  be  considered  elsewhere.  Having  said  that, 
let  (fl,  -X .  ■(Fr, .  P ),n  G  Z+  be  a  filtered  probability  space,  where  the  sub-cr-algebra  -Fri  =  er(Xn)  of 
&  is  assumed  to  be  generated  by  the  process  X"  =  {Xt}ls£t^n  observed  up  to  time  n.  Let  P0  and 
Poo  be  two  probability  measures  defined  on  this  space,  which  are  assumed  to  be  mutually  locally 
absolutely  continuos,  so  that  the  restrictions  of  these  measures  P[J  and  P'(.  to  the  sigma- algebras 
,Xn  are  mutually  absolutely  continuous  for  all  n  ^  1. 

We  are  interested  in  the  following  changepoint  problem.  In  a  “normal”  mode,  the  observed 
process  Xn  follows  the  measure  POT,  and  at  an  unknown  time  v  (v  ^  0)  something  happens  and 
Xn  follows  the  measure  P0.  The  goal  is  to  detect  the  change  as  soon  as  possible  after  it  occurs, 
subject  to  a  constraint  on  the  risk  of  false  alarms.  The  exact  optimality  criteria  will  be  specified  in 
Sectionl2.3.2l 
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2.3.1.  A  General  Changepoint  Model 

Let  Pj(X.n ),  j  —  oo,  0  denote  densities  of  P"  (with  respect  to  some  non-degenerate  cr-finite  mea¬ 
sure),  where  Xn  =  (X, , . . . ,  Xn)  is  the  sample  of  size  n.  For  a  fixed  v  E  Z+,  the  change  induces  a 
probability  measure  Pu  (correspondingly  density  pu{X.n)  =  p{X.n \u)),  which  is  a  combination  of 
the  pre-  and  post-change  densities: 


pu(Xn)  =  poc(Xu)  ■  p0(X”+1|X")  =  Hp^X^-1)  ■  J]  p0(Xj|X*_1),  (50) 

i— 1  i=v- 1-1 


where  X”  =  (Xrrn  . . . ,  Xn)  and  Pj(Xn  |Xn_1)  is  the  conditional  density  of  Xn  given  Xn_1.  In  the 
sequel  we  assume  that  v  is  the  serial  number  of  the  last  pre-change  observation.  Note  that  in  general 
the  conditional  densities  pot^lX*'1),  i  —  u  +  1,  v  +  2, . . .  may  depend  on  the  changepoint  v,  i.e., 
Po(X"j|X*_1)  =  ^(XilX*"1)  fori  >  v.  Certainly  the  densities  |  X*-1)  =  Pj,i(Xi  |X*_1), 
j  =  0,  oo  may  depend  on  i. 

In  a  particular  iid  case ,  addressed  in  detail  in  the  past  the  observations  are  independent  and  iden¬ 
tically  distributed  (iid)  with  density  foo(x)  in  the  normal  (pre-change)  mode  and  with  another  den¬ 
sity  f0(x)  in  the  abnormal  (post-change)  mode,  i.e.,  in  this  case,  (|5T)1)  holds  with  ^(X^X1-1)  = 
/ocPQ)  and p0(Xj|Xi_1)  =  /0(X4). 

We  are  interested  in  a  Bayesian  setting  where  the  change  point  u  is  assumed  to  be  a  random 
variable  independent  of  the  observations  with  prior  probability  distribution  II„  =  P  {y  ^  n), 
n  E  Z+.  We  also  write  nk  =  P(v  =  k)  for  the  probability  on  non-negative  integers,  k  =  0, 1,2,.... 
Formally,  we  allow  the  change  point  v  to  take  negative  values  too,  but  the  detailed  distribution  for 
k  <  0  is  not  important.  The  only  value  we  need  is  the  cumulative  probability  q  —  P{y  <  0).  The 
probability  P{u  ^  0)  =  q  +  7r0  is  the  probability  of  the  “atom”  associated  with  the  event  that  the 
change  already  took  place  before  the  observations  became  available. 

In  the  past,  the  typical  choice  for  the  prior  distribution  was  (zero  modified)  geometric  distribu¬ 
tion, 

P(u  <  0)  =  q  and  P{y  =  k)  =  (1  —  q)p(l  —  p)k  for  k  —  0, 1,  2, . . . ,  (51) 

where  0^g<l,  0  <  p  <  1. 

In  the  rest  of  the  paper,  we  consider  an  arbitrary  prior  distribution  that  belongs  to  the  class  of 
distributions  that  satisfy  the  following  condition: 


C.  For  some  0  ^  p  <  oo, 


lim 

n—toc 


iog(i  -  nn)| 


n 


=  p. 


In  the  case  that  fi  =  0,  we  assume  in  addition  that  for  some  r  f  1 


(52) 


^7Tfc|  log  Ttff  <  OO. 
k= 0 


(53) 


If  p,  >  0,  then  the  prior  distribution  has  an  exponential  right  tail.  Such  distributions,  as  geometric 
and  discrete  versions  of  gamma  and  logistic  distributions,  i.e.,  models  with  bounded  hazard  rate, 
belong  to  this  class.  In  this  case,  condition  (|53|)  holds  automatically.  If  p  =  0,  then  the  distribu¬ 
tion  has  a  heavy  tail,  i.e.,  such  a  distribution  belongs  to  the  model  with  a  vanishing  hazard  rate. 
However,  we  cannot  allow  this  distribution  to  have  a  tail  that  is  too  heavy,  which  is  guaranteed  by 
condition  (J53l). 


18 


FTR  ARO  Grant  #  W91  INF-14-1-0246:  General  Multidecision  Theory:  Hypothesis  Testing  and  Changepoint  Detection  with  Applications  to  Homeland  Security 


2.3.2.  Optimality  Criteria 


Any  sequential  detection  procedure  is  a  stopping  time  T  for  the  observed  process  {Xn}neZ+,  i.e., 
T  is  an  extended  random  variable,  such  that  the  event  { T  =  n)  belongs  to  the  sigma-algebra  .An. 
A  false  alarm  is  raised  whenever  T  ^  u.  A  good  detection  procedure  should  guarantee  a  small 
delay  to  detection  T  —  v  provided  that  there  is  no  false  alarm,  while  the  rate  (or  risk)  of  false  alarms 
should  be  kept  at  a  given,  usually  low  level. 

Let  P and  E/,.  denote  the  probability  and  the  corresponding  expectation  when  the  change  occurs 
at  time  v  =  fc  6  Z+.  In  what  follows,  P77  denotes  the  probability  measure  on  the  Borel  sigma- 
algebra  in  M°°  x  N  defined  as  P7r(^l  x  J)  =  J2ke.j  TfifcPfc  (.A)  for  A  E  £>(M°°),  J  C  N  and  E77 
denotes  the  expectation  with  respect  to  P77. 

In  a  Bayesian  setting,  the  risk  associated  with  the  delay  to  detection  is  usually  measured  by  the 
average  delay  to  detection 


E7r(T  —  u\T  >  v) 


J2T=o  *kEk(T  -  k\T  >  fc)P00(r  >  fc) 

1  -  PFA(T) 


(54) 


and  the  risk  associated  with  a  false  alarm  by  the  weighted  probability  of  false  alarm  (PFA)  defined 
as 

OO 

PFA(T)  =  P7r(T  ^u)  =  oo(T  <  fc).  (55) 

fc=i 


In  ([54])  and  (p5[)  we  use  the  fact  that  P k(T  >  k)  =  Poo{T  >  k )  and  Pfc(T  ^  fc)  =  Poc(T  <  fc)  for 
fc6Z+  and  that  Poo(r  ^  0)  =  0. 

For  0  <  a  <  1,  let  Ca  =  {T  :PFA(T)^a}bea  class  of  detection  procedures  for  which 
the  weighted  probability  of  false  alarm  does  not  exceed  the  predefined  level  a.  In  a  Bayesian 
setting,  the  goal  is  to  find  an  optimal  procedure  that  minimizes  in  the  class  Ca  the  average  delay  to 
detection,  i.e., 


find  Topt  e  CQ  such  that  E7r(Topt  -  u\ Topt  >  u) 


inf  E77(T  —  v\T  >  u). 

TeCa 


However,  except  for  the  iid  case,  the  solution  of  this  problem  is  not  tractable.  For  this  reason,  we 
address  the  asymptotic  problem  of  minimizing  the  average  detection  delay  as  a  approaches  zero. 
For  practical  purposes,  it  is  also  interesting  to  consider  the  problem  of  minimizing  higher  moments 
of  the  detection  delay  E7r[(T  —  u)m\T  >  u)  for  some  rn  ^  1,  i.e.,  to  find  a  first-order  asymptotically 
optimal  detection  procedure  T0  <E  CQ  that  satisfies 


lim  — 

a-S>0  lilt 


F\(T0-v 

TeCa  E 77  [(T 


n  T0  >  u\ 

-  u)m\T  > 


1. 


(56) 


2.3.3.  Change  Detection  Procedures 


Let  “H /.  :  v  —  fc”  and  “Hqo  :  v  =  oo"  be  the  hypotheses  that  the  change  occurs  at  the  point 
0  ^  fc  <  oo  and  that  the  change  never  happens,  respectively.  Then,  using  (|5()|),  we  obtain  that  the 
likelihood  ratio  (LR)  between  these  hypotheses  when  the  sample  Xn  =  {X\.: . . . ,  Xn )  is  observed 
is 


dPg 

dPS, 


TT  PojXi\X^) 


k  <  n. 
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Write  Ci  =  p0(Xi\Xl  1)/p00(A'i|Xl  1 )  and  introduce  the  normalized  average  (weighted)  LR 

n  n—  1  n 


An  = 


P(V  >  n) 


q  n  a + e 11  k  n  a  ’  n  g 


2=1 


k= 0  i=k-\- 1 


Note  that  A0  =  q/(l  —  q).  Let  gn  =  P (v  <  n\X.n )  stand  for  the  posterior  probability  of  the  change 
being  in  effect  up  to  time  n.  Shiryaev  ( 1963])  proved  that,  in  the  iid  case,  the  detection  procedure 
Ta  =  inf  {n  :  gn  a}  is  strictly  optimal  for  every  0  <  a  <  1  -  it  minimizes  the  average  detection 
delay  E7r(T  —  v\T  >  v)  if  a  =  an  is  selected  so  that  PFA(Ta)  =  a  and  the  prior  distribution  is 
geometric.  We  refer  to  this  procedure  as  the  Shiryaev  detection  procedure  in  the  general  non-iid 
case  too.  We  now  show  that  A„  =  gn/ (1  —  gn ),  so  that  the  Shiryaev  procedure  can  be  written  as 


Ta  =  inf  {n  ^  1  :  An  ^  A}  ,  A  >  0. 


(57) 


Indeed,  gn  =  Tk=- oo  p(zy  =  where 


■"2 —  1  ^ 


P(v  =  Jfc|X")  = 


n  n;.,  PoJ.x,  po  n;u+.  poM* 

n‘,iP«wix*->)  n:u+iPo(vix-' 

**  IX'.t+l  A 


and  we  obtain 


Therefore, 


9n 


9  n 


9  niu  a + thi  **  nr=fe+i  a + p  a  >  a  ’ 

gnr=iA  +  E£^n;U+iA 

9  nr=i  a + Till  Kk  ii'ik+i  a  +  p  a  >  a 

i 


n— 1 


i  —  9n  P  (v>n) 


q  n  a + e nk  n  a  ) =  t 


2=  1  A)=l  2=/c+l 

In  particular,  in  the  popular  case  of  zero  modified  geometric  prior  ([511),  the  statistic  An  is 


A„  = 


9 


n 


1  - q  7=i 


Ci 


p 


+/>EII 

k= 1 i=k 


Ci 


p 


(58) 


In  the  following,  to  avoid  triviality,  we  assume  that  A  >  q/(  1  —  q),  since  otherwise  Ta  =  0 
with  probability  1. 

By  Lemma  7.2.1  in|Tartakovsky  et  aL  ( 2014b), 


PFA(Ta)  ^  1/(1  +  A)  for  every  A  >  q/(l  —  q), 


(59) 


and  therefore  setting  A  =  Aa  =  (1  —  a) /a  guarantees  that  Ta  E  Ca. 

Another  popular  change  detection  procedure  is  the  Shiryaev-Roberts  (SR)  procedure  (due  to 


Shiryaev  ( 1963 )  and  Roberts  ( 1966))  given  by  the  stopping  time 


Tb  =  inf  {n  ^  1  :  Rn  ^  B}  ,  B  >  0, 


(60) 
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where  the  statistic  Rn,  the  SR  statistic,  is  given  by 


Rn  =  II  n  >  0  (^0  =  0).  (61) 

k= 1 i=k 

The  statistic  Rn  can  be  viewed  as  a  limit  of  the  statistic  An/p  as  p  — >  0  when  the  prior  distribution 
of  the  change  point  is  geometric  <[5Tj)  with  q  —  0.  Indeed,  see  (|58|). 

2.3.4.  r-Quick  Convergence  Versus  r-Complete  Convergence 

Introduce  the  LLRs 


Zi 


log 


Poo(Xi  IX-1)’ 


A 


k 

k-\-n 


log 


dRfc+n 

A  D  k+n 
oo 


k+n 

^2  Zi,  n  ^  1. 

i=k+ 1 


We  need  the  following  two  definitions. 

Definition  3.  Let  r  >  0.  For  k  =  0, 1,  2, . . . ,  we  say  that  the  normalized  LLR  n~l\k+n  converges 
r-quickly  to  a  constant  /  as  n  — >  oo  under  probability  P/,  if  E/,.[L/,.(£j]r  <  oo  for  all  e  >  0,  where 
Lfc(e)  =  sup  [n  ^  1  :  |n_1A^+ri  —  I\  >  e]  (sup{0}  =  0)  is  the  last  time  when  n-1  A|+n  leaves 
the  interval  [I  —  e,  I  +  e\. 

Definition  4.  Let  r  >  0.  For  k  =  0, 1,  2, . . . ,  we  say  that  the  normalized  LLR  n~l \\+n  converges 
r-completely  to  a  constant  /  as  n  — »  oo  under  probability  P/  if  for  all  e  >  0, 


1Pfc  { \n  1\kk+n~I |>e}<oo. 


(62) 


n= 1 


(For  r  =  1  this  mode  of  convergence  was  introduced  by  Hsu  and  Robbins  (1947j).) 

Note  first  that  in  general  r-quick  convergence  is  a  stronger  property  than  r-complete  conver¬ 
gence.  See  Lemma  2.4.1  in  Tartakovsky  et  al.  (2014b).  More  importantly,  checking  r-quick  con¬ 
vergence  in  applications  is  often  much  more  difficult  than  checking  r-complete  convergence. 

In  the  discrete  time  case,  Tartakovsky  and  Veeravalli  (|2005|)  developed  a  general  asymptotic 
Bayesian  theory  of  changepoint  detection  assuming  that  the  LLR  obeys  the  strong  law  of  large 
numbers  (SLLN)  with  some  positive  and  finite  constant  I,  i.e., 


Pfc-a.s. 


A  a-  _ 

U  k+U  n—>oo 


>  /  for  all  k  e  Z+, 


(63) 


with  a  certain  rate  of  convergence  expressed  via  the  r-quick  convergence,  specifically  assuming  in 
addition  that  for  some  r  >  1 


y^-E k[Lk{e)]r  <  oo. 


(64) 


k= 0 


A  similar  development  was  performed  by  Baron  and  Tartakovsky  (2006)  in  continuos  time,  assum¬ 
ing  that 

/•OO 

/  eu[lu(£)Y  dnu  <  oo. 
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However,  as  we  already  mentioned,  verification  of  the  latter  r-quick  convergence  condition  in 
particular  examples  is  not  an  easy  task. 


In  Baron  and  Tartakovsky  (2006);  Tartakovsky  and  Veeravalli  (2005),  it  was  conjectured  that 
all  asymptotic  results,  including  near  optimality  of  the  Shiryaev  procedure  (in  the  sense  defined  in 


(56 1),  hold  if  the  r-quick  convergence  condition  (64 )  is  weakened  into  the  r-complete  convergence 


E 

fc=0 


tt k 


E"r_lp4l 


72—1 


<  oo 


(with  an  obvious  modification  in  continuous  time).  In  the  following  subsections,  we  justify  this 
conjecture. 


2.3.5.  Asymptotic  Operating  Characteristics  and  Optimality  of  the  Shiryaev  Procedure 


In  this  subsection,  we  present  the  main  results  related  to  asymptotic  optimality  of  the  Shiryaev 
detection  procedure  in  the  general  non-iid  case  as  well  as  in  the  case  of  independent  observations. 

The  following  lemma,  that  establishes  the  asymptotic  lower  bounds  for  moments  of  the  detec¬ 
tion  delay,  will  be  used  for  proving  asymptotic  optimality  properties. 


Lemma  4.  Let  TA  be  the  Shiryaev  changepoint  detection  procedure  defined  in  (|57j).  Let,  for  some 
//  0,  the  prior  distribution  of  the  change  point  satisfy  condition  (52).  Assume  that  for  some 

positive  and  finite  I 


lim  Pk  (  —  max  At! ,  „  ^  (1  +  e)I  ]  =  0  for  all  e  >  0  and  all  k  G  Z+.  (65) 

M—too  V  M  l<n<M  k+n  V  + 


Then,  for  all  m  >  0, 


and 


Define 


lim  inf 

a— >•() 


infTeCa  En  [(T  -  v)m  | T  >  u] 


log  a\m 


1 

(/  +  p)m 


f  E*  {(Ta  -  v)m  | Ta  >u]^  1 

hm  mt - - - -r -  ^  — - — 

A— >oo  (log  A)m  (I  +  p)m 


(66) 

(67) 

(68) 


Recall  that  by  ([59]),  PFA(T^)  ^  (1  +  A)1  for  any  0  <  A  <  q/(l  —  q),  which  implies  that 
PFA {TAa)  ^  a  (i.e.,  TAa  G  CQ)  for  any  0  <  a  <  1  —  q  if  A  =  Aa  =  (1  —  a) /a. 

The  following  theorem  is  the  main  result  in  the  general  non-iid  case,  which  shows  that  the 
Shiryaev  detection  procedure  is  asymptotically  optimal  under  mild  conditions  for  the  observations 
and  prior  distributions. 


Theorem  8.  Let  TA  be  the  Shiryaev  changepoint  detection  procedure  defined  in  (|57|).  Let  r  fi  1 
and  let  the  prior  distribution  of  the  change  point  satisfy  condition  (C).  Assume  that  for  some 
number  0  <  /  <  oo  condition  (65 )  is  satisfied  and  that  the  following  condition  holds  as  well 


7TfcEfcir.(e)  <  oo  for  all  e  >  0.  (69) 

k= 0 
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(i)  Then  for  all  0  <  m  ^  r 


lim 

A—¥  oo 


E*[(TA-v)m\TA>v\ 
(log  A)m 


1 

(/  +  /i)m 


(70) 


(ii)  If  A  =  Aa  =  (1  —  a) /a,  where  0  <  a  <  1  —  q,  then  TAa  e  Ca  and  it  is  asymptotically 
optimal  as  a  ^  0  in  class  Ca,  minimizing  moments  of  the  detection  delay  up  to  order  r,  i.e.,  for 
all  0  <  m  f  r, 


infrec  E'Kr  -  v)™\T  >  v\ 
«™  E'[( TA„  -  0ml >  v\ 

Also,  the  following  first-order  asymptotic  approximations  hold: 


(71) 


,hJ  i'[(T-vr\T>v}~E'\(TAa-vr\TAa  >k]~  asc^o.  (72) 

This  assertion  also  holds  if  A  =  Aa  is  selected  so  that  PFA(T4q)  ^  a  and  log  Aa  ~  |  log  a  as 
a  — t  0. 


Corollary  1.  Let  r  f  1.  Let  the  prior  distribution  of  the  change  point  satisfy  condition  (C). 
Assume  that  for  some  0  <  /  <  oo 


OO 

k= 0 


X 

n= 1 


r_ 1  P  7 


\k  T 

Ak+n  1 

n 


>  £ 


<  oo  for  all  e  >  0. 


(73) 


Then  (f7Q]),  ([71])  and  (72)  hold  true. 


The  above  results  show  that  the  lower  bounds  (|66|)  and  ([67])  for  moments  of  the  detection  delay 
hold  whenever  the  LLR  process  A^+n  obeys  the  SLLN  (|63j),  since  in  this  case  condition  ([65])  is 
satisfied.  However,  in  general,  an  almost  sure  convergence  ((63])  is  not  sufficient  for  obtaining  the 
upper  bounds,  and  therefore,  for  asymptotic  optimality  of  the  Shiryaev  procedure.  In  fact,  this 
condition  does  not  even  guarantee  finiteness  of  the  average  delay  to  detection  E W(TA  —  v\TA  >  iv), 
and  to  obtain  meaningful  results  we  need  to  strengthen  the  SLLN  into  the  r-complete  version.  On 
the  other  hand,  in  the  iid  case,  where  conditioned  on  v  =  k  the  observations  X1: . . . ,  Xk  are  iid 
with  pre-change  density  jf{x)  and  Xk+i,  Xk+2, ...  are  iid  with  post-change  density  fo(x),  the  sit¬ 


uation  is  dramatically  different.  By  Theorem  4  of  Tartakovsky  and  Veeravalli  (2005),  the  Shiryaev 


procedure  asymptotically  (as  a  — >  0)  minimizes  all  positive  moments  of  the  detection  delay  in 
class  Ca  if  the  prior  distribution  is  geometric  and  the  Kullback-Leibler  information  number 


K,  =  E0A?  =  j 


log 


/op) 

foo(x) 


dp(x) 


(74) 


is  positive  and  finite. 

We  now  extend  this  result  to  the  case  where  observations  are  independent,  but  not  necessarily 
identically  distributed,  i.e.,  poo(Aj|Xi-1)  =  f^fiXfi  and  p0(3fj|X*-1)  =  fofiXf  in  (|50|).  More 
generally,  we  may  assume  that  the  increments  Zi  of  the  LLR  \hn  =  Y^i=k+i  are  independent, 
which  is  always  the  case  if  the  observations  are  independent.  This  slight  generalization  is  im¬ 
portant  for  certain  examples  with  dependent  observations  that  lead  to  the  LLR  with  independent 
increments. 
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Theorem  9.  Let  TA  be  the  Shiryaev  changepoint  detection  procedure  defined  in  (|57[).  Let  r  f  1. 
Assume  that  the  LLR  process  { Xk+n } 1  has  independent,  not  necessarily  identically  distributed 
increments  under  P /,,  k  G  Z+.  Suppose  that  condition  (|65|)  holds  and  the  following  condition 


lim  Pk  -Xi+n  <  I  -  £  )  =0 

n— >oo  \  n 


for  cdl  £  >  0,  all  i  ^  k  and  cdl  k  €  Z+ 


(75) 


is  satisfied.  Let  the  prior  distribution  of  the  change  point  be  geometric  0  with  q  =  0.  Then 
relations  m  ©  and  ([72])  hold  true  for  cdl  m  >  0  with  p  =  |  log(l  —  p)  |.  Therefore,  the 
Shiryaev  procedure  TAri  minimizes  asymptotically  as  a  ^  0  all  positive  moments  of  the  detection 
delay  in  class  C„. 

The  idea  of  relaxing  the  r-complete  convergence  condition  by  condition  ([75])  is  based  on  split¬ 
ting  integration,  when  obtaining  the  upper  bound  for  the  expectation  E ^[(Ta  —  k)+}r,  into  a  se¬ 
quence  of  intervals  (cycles)  of  the  size  Na  ~  log  A/ (I  +  //)  and  then  showing  that  —  k  > 

£Na )  ^  df,  £  =  1,2,...  for  some  small  5  under  condition  ([75]),  using  independence  of  the  LLR 
increments. 

There  are  many  examples  associated  with  Markov  and  Hidden  Markov  models  (and  even  more 
general)  that  show  that  the  developed  theory  is  useful  since  the  suggested  r-complete  convergence 
conditions  hold.  These  examples  may  be  found  in  Pergamenchtchikov  and  Tartakovsky  (Submitted 
in  20 16|);|Tartakovsky |  ([Submitted  in  2016|). 


2.4.  Asymptotic  Pointwise  and  Minimax  Theory  of  Quickest  Changepoint  Detection 

In  the  area  of  quickest  detection,  there  are  four  conventional  approaches  to  the  optimum  tradeoff 
problem:  Bayesian,  generalized  Bayesian,  multicyclic  detection  of  changes  in  a  stationary  regime, 
and  minimax  (see  Tartakovsky  et  al.  (2014b]  Ch  6)).  The  Bayesian  problem  was  considered  in  the 
previous  section  where  we  developed  a  general  Bayesian  change  detection  theory. 

By  contrast,  in  a  minimax  formulation,  the  change  point  is  assumed  to  be  an  unknown  non- 
random  number  and  the  goal  is  to  minimize  the  worst-case  delay  (with  respect  to  the  point  of 


change)  subject  to  a  lower  bound  on  the  mean  time  until  false  alarm.  Specifically,  Lorden  dl97l ) 
suggested  the  worst-worst-case  average  delay  to  detection  measure 

ESADD(r)  =  sup  ess  sup  Ej ,{r  —  is\r  >  is,  ffif) 

u^O 

that  should  be  minimized  in  the  class  of  procedures  'H1  =  (r  :  E  ^r  ^  7}  for  which  the  average 
run  length  (mean  time)  to  false  alarm  Exr  is  not  smaller  than  a  given  number  7  >  1.  Here  r  is  a 
generic  change  detection  procedure  (stopping  time),  E(/  stands  for  the  operator  of  expectation  when 
the  change  point  is  is  (is  =  00  corresponds  to  a  no-change  scenario)  and  &v  =  a(Xi, . . . ,  Xv)  is 
the  sigma-algebra  generated  by  the  first  is  observations  Xi, ,  Xu.  [Lorden] (j  1 97 1 J)  developed  an 


asymptotic  minimax  theory  of  change  detection  (in  the  iid  case)  as  7  — >  00,  proving  in  particular 
that  Page’s  CUSUM  procedure  is  asymptotically  first-order  minimax.  Later  Moustakides  ( 1986]) 
established  strict  optimality  of  CUSUM  for  any  value  of  the  average  run  length  to  false  alarm  7  > 


1.  In  the  1980s,  Poliak  ( 1985 )  introduced  a  less  pessimistic  worst-case  detection  delay  measure  ■ 
maximal  conditional  average  delay  to  detection, 


SAD2D(t)  =  sup  E v(t  —  is\t  >  is), 

i^0 


(76) 
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and  found  an  almost  optimal  procedure  that  minimizes  SAD2D(r)  subject  to  the  constraint  on  the 
average  run  length  to  false  alarm  (i.e.,  in  the  class  "H7)  as  7  becomes  large.  Poliak’s  idea  was  to 
modify  the  Shiryaev-Roberts  statistic  by  randomization  of  the  initial  condition  in  order  to  make 
it  an  equalizer.  Poliak  proved  that  the  randomized  Shiryaev-Roberts  procedure  that  starts  from  a 
random  point  sampled  from  the  quasi- stationary  distribution  of  the  Shiryaev-Roberts  statistic  is 
asymptotically  nearly  minimax  within  an  additive  vanishing  term. 

In  the  early  stages  the  theoretical  development  was  focused  primarily  on  the  iid  case.  However, 
in  practice  the  observations  may  be  non-identically  distributed  and  dependent.  A  general  asymp¬ 
totic  minimax  theory  of  change-point  detection  for  non-iid  models  was  developed  by  Lai|  ([1995] 


1998 )  (see  also|Fuh  (2003 )  for  hidden  Markov  models  with  a  finite  state-space).  In  particular,  for  a 


low  false  alarm  rate  (large  7)  the  asymptotic  minimaxity  of  the  CUSUM  procedure  was  established 
Fuh[d2003]);[LaI[([T998]). 


in 


In  the  iid  case,  the  suitably  standardized  distributions  of  the  stopping  times  of  the  CUSUM 
and  Shiryaev-Roberts  detection  procedures  are  asymptotically  exponential  for  large  thresholds 


and  fit  well  into  the  geometric  distribution  even  for  a  moderate  false  alarm  rate  (see  Poliak  and 


Tartakovsky  (]2009b)).  In  this  case,  the  average  run  length  to  false  alarm  is  an  appropriate  measure 


of  false  alarms.  However,  for  non-iid  models  the  limiting  distribution  is  not  guaranteed  to  be 
exponential  or  even  close  to  it.  In  general,  we  cannot  even  guarantee  that  large  values  of  the  average 
run  length  to  false  alarm  will  produce  small  values  of  the  maximal  local  false  alarm  probability. 
Therefore,  the  average  run  length  to  false  alarm  is  not  appropriate  in  general,  and  instead  it  is  more 
adequate  to  use  the  local  conditional  false  alarm  probability,  as  suggested  in  jTartakovsky  ( 2005 ) ; 


Tartakovsky  et  al.  (2014b).  This  issue  is  extremely  important  for  non-iid  models,  as  a  discussion 


in 


Mei|(|2008|); |Tartakovsky|(|2008|)  shows. 


In  the  project,  we  pursue  two  objectives.  First,  we  introduce  two  novel  classes  of  changepoint 
detection  procedures,  which,  instead  of  imposing  a  lower  bound  on  the  average  run  length  to  false 
alarm,  require  more  adequate  upper  bounds  on  the  uniform  probability  of  false  alarm  or  uniform 
conditional  probability  of  false  alarm  in  the  spirit  of  works  by  Lai  (|1998),  Tartakovsky  (2005) 
and  Tartakovsky  et  al.  (|2014b).  However,  these  classes  slightly  differ  from  those  proposed  in  Lai 


(|1998[);  [Tartakovsky|  (j2005[);  |Tartakovsky  et  al.|  (|2014b[).  This  modification  allows  us  to  substan¬ 
tially  relax  Lai’s  essential  supremum  conditions  |Lai| ([1998]) ,  which  do  not  hold  for  certain  interest¬ 
ing  practical  models.  In  fact,  our  conditions  are  equivalent  to  the  uniform  version  of  the  complete 
convergence  for  the  log-likelihood  ratio  processes,  i.e.,  they  are  related  to  the  rate  of  convergence  in 
the  strong  law  of  large  numbers  for  the  log-likelihood  ratio  between  the  “change”  and  “no-change” 
hypotheses.  We  concentrate  on  a  minimax  problem  of  minimizing  Poliak’s  maximal  conditional 
average  delay  to  detection  defined  in  (|76|)  as  well  as  on  a  pointwise  problem  of  minimizing  the 
conditional  average  delay  to  detection  E v(t  —  u\ r  >  v)  for  every  change  point  v  ^  0.  For  the 
sake  of  completeness,  we  also  consider  the  other  popular  risks  sup^0  E „(r  —  z/)+  and  E v{r  —  z/)+, 
v  ^  0,  while  we  strongly  believe  that  the  conditional  versions  E(/(r  —  u\t  >  u)  and  ([76])  are  more 
appropriate  for  most  applications.  We  consider  extremely  general  non-iid  stochastic  models  for  the 
observations,  and  it  is  our  goal  to  find  reasonable  sufficient  conditions  for  the  observation  models 
under  which  the  Shiryaev-Roberts  (or  CUSUM)  procedure  is  asymptotically  optimal.  To  achieve 
the  first  goal  we  exploit  the  asymptotic  Bayesian  theory  of  changepoint  detection  developed  in  the 
previous  section  that  offers  a  constructive  and  flexible  approach  for  studying  asymptotic  efficiency 
of  Bayesian  type  procedures.  It  turns  out  that  a  similar  method  can  be  used  for  the  analysis  of  min¬ 
imax  risks  and  that  the  complete  convergence  type  conditions  for  the  log-likelihood  ratio  are  also 
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sufficient  in  the  minimax  setting.  These  sufficient  conditions  as  well  as  the  main  results  related  to 
asymptotic  optimality  of  the  Shiryaev-Roberts  procedure  in  the  classes  of  procedures  with  upper 
bounds  on  the  weighted  false  alarm  probability  and  local  false  alarm  probabilities  are  given  below. 

The  second  objective  is  to  find  a  method  for  verification  of  the  required  sufficient  conditions 
in  a  number  of  particular,  still  very  general,  challenging  models.  The  natural  question  is  how 
one  may  check  the  proposed  sufficient  conditions  and  even  whether  there  are  more  or  less  general 
models,  except  of  course  the  iid  case,  for  which  these  conditions  hold.  To  this  end,  we  focus  on 
the  class  of  data  models  for  which  one  can  exploit  the  method  of  geometric  ergodicity  for  ho¬ 
mogeneous  Markov  processes.  These  results  can  be  found  in  Section  5  of  our  recently  submitted 


paper  Pergamenchtchikov  and  Tartakovsky  (Submitted  in  2016)  and  show  that  our  sufficient  con¬ 


ditions  for  pointwise  and  minimax  optimality  hold  for  homogeneous  Markov  ergodic  processes.  In 
Pergamenchtchikov  and  Tartakovsky  (Submitted  in  2016]),  these  conditions  are  further  illustrated 
for  several  examples  that  include  autoregressive,  autoregressive  GARCH,  and  other  models  widely 
used  in  many  applications. 

2.4.1.  Novel  Optimality  Criteria 

In  this  project,  we  study  the  Shiryaev-Roberts  (SR)  procedure  given  by  the  following  stopping 
time 


T(h)  =  inf  <  n  ^  1  : 


>  h 


(77) 


k= 1 


where  h  >  0  is  some  fixed  positive  threshold  which  will  be  specified  later.  We  set  inf{0}  =  +oo. 


In  the  iid  case,  this  procedure  has  certain  interesting  strict  optimality  properties  (see  Poliak  and 


Tartakovsky  (2009a)  and  Tartakovsky  et  al.  (2014b)). 


Our  main  goal  is  to  show  that  the  SR  detection  procedure  T(h )  is  nearly  optimal  in  pointwise 
and  minimax  problems  described  below. 

To  describe  these  problems  we  introduce  for  any  0  <  (3  <  1,  m*  ^  1  and  k*  >  rtf  the 
following  class  of  change  detection  procedures 


l-L*{/3,  k*,  rrf)  —  \  r  :  sup  P^t  <  k  +  m*\r  >  k)  ^  (3 

1  <k^k*-m* 


(78) 


Note  that  the  probability  P00(r  <  k  +  m\r  ^  k)  =  P 0O(fc  ^  r  <  k  +  m\r  ^  k)  is  the  conditional 
probability  of  false  alarm  in  the  time  interval  [k,  k  +  m  —  1]  of  the  length  m,  which  we  refer  to  as 
the  local  conditional  probability  of  false  alarm  (LCPFA). 

We  consider  the  conditional  detection  delay  risk 

7 Z*(t)  =  Ev  (r  -  v  |  r  >  is)  (79) 

(compare  with  {76]))  and  the  following  problems:  the  pointwise  minimization,  i.e.,  for  any  v  f  0 


inf  ; 


(80) 


and  the  minimax  optimization 


inf  max  1Z*  (r) . 

T€H*(/3,k*,m*)  0<i <fc* 


(81) 
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The  parameters  k*  and  m*  will  be  specified  later. 

In  addition,  we  consider  a  Bayesian-type  problem  of  minimizing  the  risk  {79])  in  a  class  of 
procedures  with  the  given  weighted  probability  of  false  alarm. 


2.4.2.  Asymptotic  Optimality  of  the  SR  Procedure 

We  now  proceed  with  tackling  the  pointwise  and  minimax  problems  (]80|)  and  (|8T|)  in  the  class 
of  procedures  with  given  LCPFA.  The  method  of  establishing  asymptotic  optimality  of  the  SR 
procedure  is  based  on  the  lower-upper  bounding  technique.  Specifically,  we  first  obtain  asymptotic 
lower  bounds  for  the  risk  7 Z*(r)  in  the  class  H*  (£,  k* .  m*),  and  then  we  show  that  these  asymptotic 
lower  bounds  are  attained  for  the  SR  procedure  T{h )  with  a  certain  threshold  h  =  hp. 

We  do  not  assume  any  particular  model  or  even  class  of  models  for  the  observations,  and  as  a 
result,  there  is  no  “structure”  of  the  LLR  process.  We  therefore  have  to  impose  some  conditions 
on  the  behavior  of  the  LLR  process  at  least  for  large  n.  It  is  natural  to  assume  that  there  exists  a 
positive  finite  number  /  such  that  A^/ (ri  —  k )  converges  almost  surely  to  /  under  P^,  i.e., 

(A ! )  Assume  that  there  exists  a  number  I  >  0  such  that  for  any  k  ^  0 


-A‘+n  / . 

n  ^  n— »oo 


This  is  always  true  for  iid  data  models  with 


E0^i 


fi(x)dp,(x) 


(82) 


being  the  Kullback-Leibler  information  number.  It  turns  out  that  the  a.s.  convergence  condition 
([82])  is  sufficient  for  obtaining  lower  bounds  for  all  positive  moments  of  the  detection  delay. 

Next,  for  any  0  <  £  <  1,  m*  ^  1  and  k*  >  m* ,  define 


«1  =  aq(£,m*)  =  £  +  (1 


and 

where 


CXn 


and  02,0  = 


-  Mm‘+1 

(83) 

(84) 

Si,0 

1  +  |  log  1  log  £||  ' 

(85) 

1  +  |  log  /5| 

To  find  asymptotic  lower  bounds  for  the  problems  ([80])  and  (|8T|)  in  addition  to  condition  (A,) 
we  impose  the  following  condition  related  to  the  growth  of  the  window  size  m*  in  the  LPFA: 


(HJ  The  size  of  the  window  m*  in  ([83])  is  a  function  of  (3,  i.e.  m*  =  rnf,  such  that 


llog«l,/3l  1 

lim  m — =  1  ’ 

/8-+0  I  log  P  | 

where  a1p  =  cx1(/3,m*). 

For  example,  we  can  take  m*  =  1  +  [(1  +  |  log  /5| )2J . 

The  following  theorem  establishes  asymptotic  lower  bounds. 


(86) 
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Theorem  10.  Assume  that  conditions  (Ax)  and  (H1)  hold.  Then,  for  any  k*  >  m*  and  a  0, 
1 


lim  inf 


0  |  log/? |  TeH*(P,k*,m*)  v^o 


inf  sup  77.*  (r)  ^  lim  inf 


—  inf  7 Z*(t)  ^  \ . 


(87) 


In  order  to  study  asymptotics  for  the  average  detection  delay  of  the  SR  procedure  and  for  es¬ 
tablishing  its  asymptotic  optimality,  we  impose  the  following  constraint  on  the  rate  of  convergence 
for 

=  fin  -  I  ■  (88) 

(A2)  Assume  that  Xk  n  converges  uniformly  completely  to  0  as  n  — »  oo,  i.e.,  for  any  e  >  0 


OO 

T*(£)  =  J2  SUP  P4  \n  >  4 


n= 1 


>  £  >  <  OO  . 


(89) 


To  establish  asymptotic  optimality  properties  of  the  SR  procedure  with  respect  to  the  risks  7 7.*  (r) 
(for  all  v  ^  0)  and  sup„^0 77.*  (r)  in  the  class  TL*  (/3,  k*,m*)  we  need  the  uniform  complete  con¬ 
vergence  condition  (A2)  as  well  as  the  following  condition. 


(H2)  Parameters  k*  and  m*  are  functions  of  (3,  i.e.  k*  =  kl  and  m*  =  m* ,  such  that 

loS  «3,/?l 


(l  log«3,/3l  +  ^log(!  -  Qi,p))  =  +°° 


and  lim 


/3— >0  |  log  ft  | 


=  1 


(90) 


where  a3>p  =  a3(P,k*p). 

We  can  take,  for  example,  the  parameters  k*  =  kf  and  m*  =  m*  as 

m*p  =  1  +  [(1  +  |  log /3 1 )2J  and  k*p  =  2m*  . 

Denote  by  Tp  the  SR  procedure  T(hlf)  defined  in  (f77|)  with  the  threshold  hfi  given  by 

1 


(91) 


h*  = 


a 


3,/3 


(92) 


@2,/3a3,l3 

Theorem  11.  If  conditions  (H1)  and  (H2)  hold,  then,  for  any  0  <  ft  <  1,  the  SR  procedure 
Tp  with  the  threshold  hi  given  by  (|92|)  belongs  to  the  class  TL*  iff  k*,m*).  Assume  that  in  addi¬ 
tion  condition  (A2)  is  satisfied.  Then  the  SR  procedure  Tp  is  first-order  asymptotically  uniformly 
poitwise  optimal  and  minimax  in  the  class  TL*  (j3,  k*,m*),  i.e., 


/3— >0 


K*u(T*p) 


=  1  for  all  fixed  v  ^  0 


(93) 


and 


lim 

P— 5.0 


max0 


K(r) 


max. 


0 


7^*(T;) 


=  l. 


(94) 
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Also,  as  3  — >  0,  the  following  first-order  asymptotic  approximations  hold  for  the  pointwise  and 
maximal  risks: 


and 


K(t: 


inf  1Z* 

rCH(/3,fc*  ,m*) 


I 


for  any  is  f  0 


(95) 


sup  1Z*U{TV)  ~  inf  sup  7 Z*(r)  ~ 

0^1 '^k*  P  T£H(l3,k*,m*) 


logffl 

I 


(96) 


The  results  of  Theorem  [TT]  can  be  extended  to  higher  moments  of  the  detection  delay  by 
strengthening  the  complete  convergence  with  the  uniform  r-complete  convergence.  More  specifi¬ 
cally,  the  following  asymptotic  optimality  result  holds  true. 


Theorem  12.  Assume  that  conditions  (IT , )  and  (H2)  hold,  and  in  addition,  for  some  r  >  1  the 
uniform  r-complete  convergence  condition 


OO 

V  nr_1  sup  Pfc<  Xkn  >  £  >  <  oo  for  all  e  >  0 

71=1  k 1  ’  J 


(97) 


is  satisfied.  Then,  for  any  0  <  (3  <  1,  the  SR  procedure  with  the  threshold  If  given  by 
belongs  to  the  class  Ti*  (/ 3 ,  k*,  m*)  and  as  (3  — >■  0  for  any  0  <  i  f  r 

E„  [(r;  -  u)£\T*  >  is\  ~  inf  E u[(t-isY\t>is] 


log  II  V 


(98) 


for  all  v  f  0 


and 


max  Eu  |  (To  —  isY\TZ  >  is\  ~  inf  max  Eu  \{t  —  isY\t  >  is] 

Li  P  P  J  r£H*(l3,k*,m*)  0 L  J 


ipgjgiy 


(99) 


Therefore,  the  SR  procedure  Tj |  is  first-order  asymptotically  uniformly  pointwise  optimal  and  also 
minimax  in  the  class  TL*  (3-  k* ,  rn*)  with  respect  to  the  moments  of  the  detection  delay  up  to  order 
r. 


3.  POTENTIAL  IMPACTS 

The  research  produces  general  theories  of  sequential  hypothesis  testing  and  quickest  changepoint 
detection  for  very  general  non-iid  stochastic  models,  as  well  as  novel  nearly  optimal  tests  of  com¬ 
posite  hypotheses  and  changepoint  detection  procedures  that  significantly  impact  the  effectiveness 
of  DOD  in  recognizing  unusual  patterns  of  activity  in  heterogeneous  volumes  of  data  and  auto¬ 
matic  threat  detection.  We  believe  that  our  research  results  in  practical  and  scalable  algorithms  for 
on-line  detection  and  recognition  of  threats,  in  particular  in  cyber  security  applications  related  to 
rapid  detection  of  intrusions  in  computer  networks  with  very  low  false  alarm  rates. 
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2.  A.G.  Tartakovsky,  Nearly  Optimal  Sequential  Tests  of  Composite  Hypotheses  Revisited, 
Proceedings  of  the  Steklov  Institute  of  Mathematics,  Vol.  287,  pp.  268-288,  2014  (Invited 
Paper  for  Special  Issue  in  honor  of  the  80th  birthday  of  Professor  Albert  Shiryaev). 

3.  BOOK:  A.G.  Tartakovsky,  I.  Nikiforov,  and  M.  Basseville,  Sequential  Analysis:  Hypothe¬ 
sis  Testing  and  Changepoint  Detection  (Monographs  on  Statistics  and  Applied  Probability.) 
Chapman  &  Hall/CRC,  Boca  Raton,  FL,  2014. 

4.  BOOK  CHAPTER:  A.  G.  Tartakovsky,  Rapid  Detection  of  Attacks  in  Computer  Networks 
by  Quickest  Changepoint  Detection  Methods  (In  N.  Adams  and  N.  Heard,  editors.  Data 
Analysis  for  Network  Cyber-Security ),  pages  33-70.  Imperial  College  Press,  London,  ETK, 
2014. 

(c)  Papers  published  in  non-peer-reviewed  journals  or  in  conference  proceedings:  None 

(d)  Papers  presented  at  meetings,  but  not  published  in  conference  proceedings: 

1.  A.G.  Tartakovsky  (joint  with  G.  Fellouris),  Optimal  Unstructured  Sequential  Detection  in 
Multiple  Channel  Systems,  The  7th  International  Workshop  on  Applied  Probability,  Antalya, 
Turkey,  June  16-19,  2014  (Invited). 


30 


FTR  ARO  Grant  #  W91  INF- 14- 1-0246:  General  Multidecision  Theory:  Hypothesis  Testing  and  Changepoint  Detection  with  Applications  to  Homeland  Security 


(2)  Demographic  Data  for  this  Reporting  Period: 

(a)  Number  of  Manuscripts  submitted:  7 

(b)  Number  of  Peer  Reviewed  Papers:  5 

(c)  Number  of  books  and/or  book  chapters  submitted  or  published:  2 

(c)  Number  of  Non-Peer  Reviewed  Papers  submitted  during  this  reporting  period:  0 

(d)  Number  of  Presented  but  not  Published  Papers  submitted  during  this  reporting  period:  1 

(3)  Demographic  Data  for  the  life  of  this  agreement: 

(a)  Number  of  Scientists  Supported  by  this  agreement:  3 

(b)  Number  of  Inventions  resulting  from  this  agreement:  0 

(c)  Number  of  PhD(s)  awarded  as  a  result  of  this  agreement:  0 

(d)  Number  of  Bachelor  Degrees  awarded  as  a  result  of  this  agreement:  0 

(e)  Number  of  Patents  Submitted  as  a  result  of  this  agreement:  0 

(f)  Number  of  Patents  Awarded  as  a  result  of  this  agreement:  0 

(g)  Number  of  Grad  Students  supported  by  this  agreement:  2 

(h)  Number  of  FTE  Grad  Students  supported  by  this  agreement:  0 

(i)  Number  of  Post  Doctorates  supported  by  this  agreement:  0 

(j)  Number  of  FTE  Post  Doctorates  supported  by  this  agreement:  0 

(k)  Number  of  Faculty  supported  by  this  agreement:  1 

(l)  Number  of  Other  Staff  supported  by  this  agreement:  0 

(m)  Number  of  ETndergrads  supported  by  this  agreement:  0 

(n)  Number  of  Master  Degrees  awarded  as  a  result  of  this  agreement:  0 

(4)  Student  Metrics  for  graduating  undergraduates  funded  by  this  agreement 

(a)  Number  of  undergraduates  funded  by  your  agreement  during  this  reporting  period:  0 

(b)  Number  of  undergraduate  funded  by  your  agreement,  who  graduated  during  this  period:  0 

(c)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period 
with  a  degree  in  a  science,  mathematics,  engineering,  or  technology  field:  0 

(d)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period 
and  will  continue  to  pursue  a  graduate  or  Ph.D  degree  in  a  science,  mathematics,  engineering,  or 
technology  field:  0 

(e)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period 
and  intend  to  work  for  the  Defense  Department:  0 

(f)  Number  of  undergraduates  graduating  during  this  period,  who  achieved  at  least  a  3.5  GPA 
based  on  a  scale  with  a  maximum  of  a  4.0  GPA.  (Convert  GPAs  on  any  other  scale  to  be  an 
equivalent  value  on  a  4.0  scale.):  0 

(g)  Number  of  undergraduates  working  on  your  agreement,  who  graduated  during  this  period 
and  were  funded  by  a  DoD  funded  Center  of  Excellence  for  Education,  Research  or  Engineering: 
0 

(h)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period  and 
will  receive  a  scholarship  or  fellowship  for  further  studies  in  a  science,  mathematics,  engineering 
or  technology  field:  0 

(5)  Report  of  inventions 

None. 
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